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Abstract Efficient algorithms for searching for opti- 
mal saturated designs are widely available. They maxi- 
mize a given efficiency measure (such as D-optimality) 
and provide an optimum design. Nevertheless, they do 
not guarantee a global optimal design. Indeed, they 
start from an initial random design and find a local 
optimal design. If the initial design is changed the op- 
timum found will, in general, be different. A natural 
question arises. Should we stop at the design found or 
should we run the algorithm again in search of a bet- 
ter design? This paper uses very recent methods and 
software for discovery probability to support the deci- 
sion to continue or stop the sampling. A software tool 
written in SAS has been developed. 

Keywords Design of experiments • Optimal designs • 
Unobserved species • Discovery probability 



1 Introduction 

In the design of experiments, optimal designs, or opti- 
mum designs, are a class of experimental designs that 
are optimal with respect to a given statistical criterion. 

In this paper we focus on saturated optimum designs 
(SOD). Saturated designs contain a number of points 
that is equal to the number of parameters of the model. 
It follows that SODs are often used in place of standard 
designs, such as orthogonal fractional factorial designs, 
when the cost of each experime ntal run is high. Mai n 



references t o this t opic include I Atkinson et all (2007) 



Pukelsheiml d2006l) . IShah and Sinhal dl989h and 
197i 



Wvnn 



R. Fontana 

Department of Mathematical Sciences - Politecnico di Torino 
Tel.: +39-011-0907504 E-mail: roberto.fontana@polito.it 



The optimality of a design depends on the statisti- 
cal model that is assumed and is assessed with respect 
to a statistical criterion, which, for information-based 
criteria, is related to the variance-matrix of the model 
parameter estimators. Well-known and commonly used 
criteria are A-optimality and D-optimality. 

Widely used statistical systems like SAS and R have 
procedures for finding an optimal design according to 
the user's specifications. In this pa per we will refer to 
Proc Optex of SAS/QC |sai (|2010h b but the approach 
can be adopted for other software. 

The Optex procedure searches for optimal exper- 
imental designs. The user specifies an efficiency crite- 
rion, a set of candidate design points, a linear model and 
the size of the design to be found and the procedure gen- 
erates a subset of the candidate set so that the terms in 
the model can be estimated as efficiently as possible. By 
default, the standard output of the procedure is a list 
of 10 designs that are found as the result of 10 runs of 
the ex change search algorithm (jMitchell and Miller Jr 
( 1970l) ) starting each time from an initial completely 
randomly chosen design. 

The number of times that we decide to run the 
search algorithm is crucial. Obviously, if we increase 
it, in general we will explore different local optima with 
the possility to find better designs. On the other hand, 
sometimes, the extra time that we use to explore other 
possibilities is wasted because new optima do not ex- 
ist. This work aims at developing a methodology that 
could support the user in making the decision whether 
to stop or continue the search. 

The paper is organized as follows. In Sect. [2] we 
state the problem of finding new optimal designs as the 
problem of finding new species in a population. Then, 
in Sect. [3J using some examples, we describe how our 
methodology, which is based on the estimator of the 
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discovery probability, could be used for optimal design 
generation. In Sect.[3]we describe the algorithm in more 
detail. The software code that has been developed is 
written in SAS, is available on request and can be used 
for any choice of factors, levels and model. Concluding 
remarks are in Sect. [5j 



2 Optimal designs vs richness of species 

We consider the following setting that is quite common 
in optimal design problems. 

We have d factors, A\, . . . , Ad- The factor A; has s$ 
levels coded with the integer 0, . . . , Sj — 1, i = 1, . . . , d. 
The full factorial design is T> = {0, . . . , s% — 1} X . . . X 
{0, . . . , s m - 1}. For each point ( = . . . , Crf) of V 
we consider a real- valued random variable Y^ u ___^ d . We 
make the hypothesis that the means of the responses, 
E [Y] where Y is the column vector [If ; C € 2?] can be 
modeled as 



E [Y] = X v p , 



(1) 



where Xd is the non-overparametrized design matrix, 
as it will be defined in Sect. 12.11 and f3 is the subset of 
all the effects (constant effect, main effects and interac- 
tions) that are supposed to affect the response Y . 

Given an efficiency criterion </>, a saturated optimal 
design (</>-SOD) is a subset of the full factorial design 
T> = {0, . . . , si — l}x . . . x {0, . . . , s m — 1}, whose size is 
equal to the number of degrees of freedom of the model 
(fT]) and that maximizes this criterion <f>. In this paper we 
focus on information-based criteria and, in particular, 
on D-optimality but other criteria can be chosen (like 
Aoptimality and G-optimality) . We denote this type of 
problem with the triple (V,At,<fi) where V is the full 
design, M. is the hypothesized model (see Eq. [T]) and 4> 
is the optimality criterion. 

Given a subset J- of T>, the information matrix is 
defined as X'-pXjr where Xjr is the design matrix cor- 
responding to T and X' is the transpose of X. D- 
optimality aims at maximizing Djr, the determinant of 
the information matrix 



D r = dct(A^A» 



(2) 



There are several algorithms for searching for D- 
optimal designs. They have a common structure. They 
start from an initial design, randomly generated or user 
specified, and move, in a finite number of steps, to a 
better design. In general, if a different initial design is 
chosen, a different optimal design is found. 

It follows that, given an algorithm a, a population 
A® of D-optimal designs can be defined. This popula- 
tion is made up of all the saturated designs that are the 



result of the execution of the algorithm a and is a sub- 
set of all the subsets of T> of size equal to the number 
of degrees of freedom of the model. 

The elements of A® can be classified into species, 
according to the criterion for which T\ € A® and T% G 
A® are of the same species if and only if they have the 
same value in terms of the D criterion, Djr % = D^ 2 . 

We observe that, as proved in Proposition [U iso- 
morphic designs belong to the same species, while, in 
general, the opposite is not true because there are de- 
signs with the same value of the D criterion but that are 
not isomorphic. As is known two designs are isomorphic 
if one can be obtained from the other by relabeling the 
factors, reor dering the runs, and sw itching the levels of 
factors, e.g. IClark and Deanl ([20011 ). 



Proposition 1 Let us consider T\ C T> and C V. 

If T\ and T2 are isomorphic then Dj: x = D^ 2 . 

Proof We separately analyse row/column permutations 
and the switching of the levels of some factors. If Ty, is 
obtained permuting the rows and/or the columns of J-\ 
it follows that 

Ajf 2 = RXjr % C 

where R and C are permutation matrices. Then 

= det((A^ 2 A> 2 )) - (det(i?)) 2 det((^ i ^ 1 ))(det(C)) 2 

= D 

being det(i?) = det(C) = 1. A similar argument holds 
for switching the levels of some factors. □ 

Studying the species of A® or, in general, of A^, 
where cf> is an optimal criterion, is interesting for op- 
timal design generation. Let us consider the problem 
(V, Ai, <fi) and let us choose an algorith a to search for 
0-SODs. If we run this algorithm n times, each time 
starting from a completely random initial design, we 
will get a sample of n elements of A%. Such elements 
can be classified in k n < n different species accord- 
ing to the value of the criterion 4>. Recent methods fo r 
discovery probability estimation, iFavaro et a 3 (I2012L 
can be applied to the vector (^x, I2, ■ ■ ■ > &n) where £ r 
is the number of species in the sample with frequency 
r, r = 1, . . . , n. In particular, based on a sample of size 
n, for any additional unobserved sample size m > and 
for any frequency k = 0, . . . , n + m, these methods pro- 
vide, an explicit estimator for the probability U n + m (k) 
that the (n + m + l)-th observation coincides with a 
species whose frequency, within the sample size n + m, 
is exactly k. The case m = k — corresponds to as- 
sessing the probability of finding a new species in the 
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subsequent observation, that in the context of optimal 
designs, is the probability of finding a saturated design 
with a different value of the criterion <fi in the subse- 
quent run of the algorithm. If this probability U n +o (0) 
is sufficiently high (let us say greater than 0.1 or even 
0.05) it would be convenient to run the algorithm again 
because it is likely that we could find a new optimal de- 
sign. If we found a new design, it could have a greater 
value of 4> and this obviously represents an improvement 
to our optimization process. Even if this new design did 
not have an higher value of <j) than the existing ones, this 
would give the possibility to increase the known part of 
A%. In particular, for D-optimal designs, from Propo- 
sition [TJ we know that designs with different values of 
Djr are non-isomorphic designs. It is quite common, 
in practical applications, to choose a design where the 
optimal criterion has a slightly smaller value than the 
maximum obtained but which has other better charac- 
teristics, such as space filling properties. The knowledge 
of a set of non-isomorphic designs can also be used for 
non p aram etric testing proced ures, Giancristofaro et~al 
( 20lj) and lBasso etal (|2004h . 



2.1 The design matrix 

The design matrix X%> in Eq. [T]is built as follows. 

— The first column is equal to 1 and corresponds to the 
constant effect, denoted by [i. The constant effect is 
always considered as a term of the model. 

— If the main effect of the factor Ai is to be considered 
in the model, the corresponding si — 1 columns are 
computed as follows. For a design point with Ai at 
its fc-th level 

— if 1 < k < Si — 1 the columns are all except for 
the fc-th column that is 1; 

— if k = Si the columns are all —1 

— If an interaction A^ ★ . . . ★ Ai k is to be considered 
in the model, the corresponding (s^ — 1) • . . . ■ (si k — 
1) columns are computed by taking the horizontal 
direct product of the colummns corresponding to 



the main effects of A 



■ ; Ai 



This coding corresponds to modeling without over pa- 
rametrization and X-p is full rank. 

For a subset T of V, the design matrix Xjr is simply 
built deleting from Xp the rows that correspond to the 
points of V that are not in T . 



reader should refer to the original paper for a detailed 
description of the methodology. 

Given a sample of size n, (£i, . . . , £ n ), where £ r is the 
frequency of species that have been observed r-times in 
the sample, r = 1, . . . , n. We have Y^i=i = n - We 
denote the number of different species that have been 
observed in the sample by j. We get J27=i = 

Based on a sample of size n, for an additional un- 
observed sample size to > and for any frequency 
k = 0, . . . , n + to, using a non parametric Bayesian ap- 
proach, Favaro et al provide an estimator for the prob- 
ability J7„, m that the (n + m + l)-th observation coin- 
cides with a species whose frequency, within the sample 
of size n + to, is exactly fc. 

We are interested in discovering new species, that 
correspond to the case k = 0. 

From Section 2 of on p. 1190 we obtain 



two) 



V, 



where, for the two-parameter Poisson-Dirichlet process, 



we have V nd = UUii + + l)n-i. ° ^ (0,1), 



6 > —a. The symbol (a) n denotes the n-th ascending 
factorial of a, (a) n = a(a + 1) . . . (a + n — 1), (a)o = 1. 
It follows that 



■ 3° 



E4+o(0) = 

1/ T II 

and, for to > 0, we obtain 

6 + ja (6 + n + a), 



U n+m (0) 



n (9 + n+ 1)., 



The estimates a, 9 of a, 9 are obtained as 



are max 

(a,9) 



1 )n-l 



-Y 



(3) 



Using (9, a) we finally obtain the estimates of the 
discovery probability at the (n + l)-th observation 



£W0) 



■ jo- 



(4) 



2.2 Discovery probability 

We briefly summa rize the main resul ts that are used 
in this work, as in iFavaro et all (|2012l ). The interested 



and at the (n + to + l)-th observation, to > 0, 



9 + ja (6 + n + a), 
9 + n (§ + n + l)r. 



(5) 
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3 Methodology and Applications 

We repeat the search for optimal designs to analyse the 
population A® of Z?-optimal designs that can be found 
for a given problem using a predefined algorithm. Each 
time the algorithm starts from a randomly chosen initial 
design. We set a maximum number of iterations equal 
to M* and we continue the process until the estimate 
of the discovery probability at the subsequent observa- 
tion goes under a given threshold or the maximum 
number of iterations is reached. 

The procedure can be described as follows. A prob- 
lem (T>, A4, </>), with 4> — D in our examples, is defined 
and an algorithm a for 0-optimal design generation is 
chosen. For each iteration s, s = 1, . . . , M+, 

1. using the algorithm a, a ^-optimal saturated design 
T s is obtained; 

2. the values of the ^-criterion of J- s is computed; 

3. the vector (£±, . . . , £ s ) is built, where £ r is the num- 
ber of species with frequency r, r = 1, . . . , s; 

4. an estimate (<r s ,9 s ) is obtained, see Eq. O 

5. an estimate of U s +o(0) is computed using Eq. 2J 

6. if U s +o(0) < p+ the algorithm stops, otherwise the 
next iteration s + 1 is performed (if s + 1 > M+ the 
algorithm stops). 

The main output of the algoritm is a set of designs, 
where each design belongs to a different species, i.e. 
has a different value of the 0-criterion. 

We show how the methodology works using the fol- 
lowing problem. Let us consider 7 factors, each with 2 
levels and the model that contains the overall mean, the 
main effects and all the 2-factor interactions for a total 
of 1 + 7 + 21 = 29 degrees of freedom. We search for 
saturated .D-optimal designs that is D-optimal designs 
that contains 29 points. 

We use Proc Qptex Isasl ( 2010l ) with the exchange 
method, which is its default search method. With the 
default setting, the algorithm starts from 10 initial ran- 
domly chosen designs providing 10 _D-optimal designs. 
We consider the design with the highest value of the 
D-efficiency of the 10 optimal designs as the optimal 
design found by the algorithm. 

Setting the seed that is used for the random gen- 
eration of the initial designs at 6789, the best of the 
10 optimal designs, that we denote by has Djr t = 
9.0911i;39 and Ej? = 82.3162, where E®, the D-efficiency 
of a J 7 , is defined as 



Table 1 Number l r of D optimal designs that have found r 
times, r = 1, . . . , 493; only t r ^ are shown. 



r 


i r 


1 


47 


2 


18 


3 


7 


4 


10 


5 


2 


6 


4 


9 


2 


11 


1 


12 


1 


14 


2 


15 


1 


16 


1 


17 


2 


20 


1 


36 


1 


39 


1 


40 


1 


46 


1 


Total 


103 



Now we run the procedure above with M* = 1, 000 
and p± — 0.10. 

After 493 runs, the estimate of the discovery proba- 
bility at the next observation becomes lower than = 
0.10 and the algorithm stops (LWh)(0) ~ 0.099). We 
find 103 different local Z?-optimal designs. All these de- 
signs are not isomorphic (Proposition[T]). The maximum 
(minimum) value of inefficiency is 85.6265 (78.9605). 

We decide to continue the search for new species 
choosing = 0.05 and = 2,000. The latter value 
is chosen taking into account that using Eq. [S] we get 
EWiooo(O) = 0.049 and ^493+2000(0) = 0.035. We 
observe that these supplementary runs are added to 
the previous ones. 

After 1, 271 supplementary runs the estimate of the 
discovery probability at the next observation becomes 
lower than 0.05, f/i 76 4+o(0) « 0.0499. After 1,271 + 
493 = 1,764 simulations we observe 191 different D- 
optimal designs. The maximum value of £>-efficiency is 
still 85.6265, while the minimum is 78.1134. 



100 x 



1 



where is the number of runs of T that coincides 
with the degrees of freedom of the model for saturated 
designs. 



W e can now use the Fedorov algorithm, iFedorov 
( 1972h . that is considered more reliable, even if slower, 
than the exchange algorithm. We keep the standard set- 
ting for which, at each iteration, 10 optimal designs are 
generated and the one among them that has the highest 
D-efficiency value is taken as the optimal design. 

We choose 3456 as the initial seed. The first itera- 
tion provides an optimal design JF\ with = 82.7079. 
Now we repeat the procedure with M* = 1,000 and 
p+ = 0.10. After only 18 iterations, as ?7i8+o(0) ~ 
0.087, the algorithm stops, with 4 different designs. The 
maximum (minimum) value of D-efficiency is 83.9844 
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(82.4212). We have empirical evidence that the Fedorov 
algorithm is more stable than the exchange algorithm. 
We observe that the best design found with the ex- 
change algorithm, that has D-emciency equal to 85.6265, 
is not found in this first sample. We were able to find 
it running the algorithm again with M* = 1,000 and 
p* = 0.01. 

4 The algorithm 

In this section we provide a detailed description of the 
algorithm that has been developed to study the popu- 
lation Aa that contains all the D-optimal designs that 
can be found by the algorithm a. 

A problem (T>, M,<fi — D) is defined and an algo- 
rithm a for D-optimal design generation is chosen. The 
set of candidates that, in our setting, is the full factorial 
design is generated using an ad-hoc module written in 
SAS/IML. The algorithm a can be chosen from a list 
of methods that includes the exchange algorithm and 
the Fedorov algorithm. 

For each iteration s, s = 1, . . . , M*, 

1. using the algorithm a, a D-optimal saturated design 
F s is obtained; 

2. the value of the D-efficiency, E® , of JF S is computed; 

3. the vector . . . , £ s ) is built, where l r is the num- 
ber of species with frequency r, r = 1, . . . , s; 

4. an estimate (o s ,9 s ) is obtained, see Eq. [31 

5. an estimate of U s+ q(0) is computed using Eq. |4j 

6. if J7 s +o(0) < p± the algorithm stops, otherwise the 
next iteration s + 1 is performed (if s + 1 > M* the 
algorithm stops). 

The main output of the algoritm is a set of designs, 
where each design belongs to a different species, i.e. has 
a different value of the Z)-criterion. 

4.1 Steps 1 and 2 

At iteration s, with the chosen algorithm a, the Proc 
Optex procedure is used to generate a D-optimal de- 
sign, J" s . The species of T s is the value of its Z)-efficiency, 
E^ . The value of the efficiency is rounded to four dec- 
imal digits to avoid creating different species from nu- 
merical effects. 

4.2 Step 3 

Using all the designs T\ , . . . , T s with their correspond- 
ing inefficiencies, Ejl , . . . , E® the vector (£i, . , . , £ s ) is 
built, where £ r is the number of species with frequency 
r,r = l,...,s. 



4.3 Step 4 

An estimate (a s ,9 s ) must be obtained searching for 
(a, 9), a £ (0,1), 9 > — er that maximizes f(cr, 9), (see 
Eq.©, 

v ' i—l 

The Genetic Algorithm module of SAS/IML has 
been used. In order to manage the constraints a 6 
(0,1), 9 > —a the search has been performed in the 
region TZ = [S, 1 - 8] X [-(1 - 6), T M ] with 6 = 0.01 and 
Tm = 1,000. This region contains the non- feasible re- 
gion made by the points inside the simplex S = TZ n 
{(a, 9) : 9 < — a} whose vertices are (S, —(1—5)), (S, —5) 
and (1 — S, — (1 — 5)). We observe that the edges of S 
contain non-feasible points. 

We decided to manage this constraint with the pen- 
alty method, because this method usually works well 
when most of the points in the solution space do not 
violate the constraints, as in our problem. The way in 
which the penalty in the objective function for unsatis- 
fied constraints has been imposed is described here. 

From the point of view of the search of the point 
(cr^,^) that maximizes f(cr,9), it is equivalent to con- 
sider logf(cr, 9) instead of f(cr, 9) 

i-i 

logf(<7, 9) = log(H(0 + io-)) + log(n!) + 

i=l 

log((0 + l) n _x) + to g(f[{ £-^ }*) " log(A-0 ■ 

i=l 

Omitting the terms that do not depend on a and 9 and 
as (a) n = r ^(a)"^ where T is the gamma function, the 
previous equation becomes the function f * (a, 9) here 

where 

i-i 

f«M) = £fi M) M) 

i=l 

with fi M) (ct, 9) = log(0 + io) and 

fi 2) (a, 9) = - log r(0 + n) + log T(6 + 1) + 

n 

+ J2^ogT(i-a)-jlogT(l-a). 
»=i 

We observe that, if the point (a, 9) £ TZ does not sat- 
isfy the constraint 9 > —a only i^(a,9) becomes not 
defined. We apply a penalty value to i^\a,9) and to 
f ^ [a, 9) as described below. 
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Given a point Pi in the non-feasible region, Pi = 
{(7,0) G S, Pi, the closest point to Pi with respect to 
the euclidean distance that lies in the feasible region, is 
determined 

Pi = (a,9) = (~(a-e + e),^(9-a + e)) 

where e is a very small number to ensure that Pi is 
feasible, i.e. Pi G K n S. We used e = 0.001. The 
value of the function ^ is computed in Pi getting 
Yi = ^ 1A) {a,9) = loge. Then the value Y 1 of f^ 1 ' 1 ) 
in Pi is defined as fi M) 0, °) = i 1 + h)Yi where h 
is the euclidean distance between Pi and Pi, bi = 

\J \(& + 9 — e) 2 . In an analogous way, we apply this 
penalty method to all Pi = (ia, 9) that eventually fall 
in the non- feasible region S getting i^ p (a,9), the pe- 
nalized version of (er, 9), 



i=l 




log(6> + ia) 
(l + &,)log(e) 



if 0- 
if 9 



-ia > 

< 



,i = i,... , j-i , 



and 6, is the euclidean distance between Pi = (ia, 9) and 
Pi = (^(ia — 9 + e,^(9 — ia + e) determined as described 
above. The penalized version i^ p (a,9) of i^\a,9) is 
simply defined as 



®M) = < 



ffi 2) M) 
(l + MffM) 

(l-iOf^M) 



if + cr > 
if + zcr < 

and fi 2) (cr,6») < . 
if 9 + ia < 

and fi 2) (CT,6») > 



We observe that 



1. p < b p > b q p,q= l,...,j - 1; 

2. &i < ^(1 + e - 2<5). For S = 0.01 and e = .001 we 
get h < 0.694. 

Using the penalty method, an estimate (a s ,9 s ) is ob- 
tained finding the maximum of {^^{a, 9) = f^ p (a, 9) + 



4.4 Steps 5 and 6 

The estimate of the discovery probability at the next 
iteration, U s+ o(0), is computed as described in Sect.[3l 
Eq0J If its value is lower than p+ the algorithm stops, 
otherwise the next iteration s+1 is performed (if s + 1 > 
M* the algorithm stops). 



5 Conclusion 

Given an optimality crierion <f>, the problem of ^-optimal 
design generation has been addressed. A methodology 
to support the decision whether to continue or stop the 
search for optimal designs has been developed. It com- 
bines recent advances on discovery probability estima- 
tion, based on a Bayesian non parametric approach, 
Favaro et al ( 20121 ). with well known methods for opti- 
mal design generation. 

In principle, this methodology could be applied to 
any discrete optimisation problem. This topic will be 
part of future research. 

A software code, written in SAS, that makes use of 
the Proc Optex procedure, has been developed. 
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